An Exact Implicit Enumeration Algorithm for Variable Selection in Multiple Linear Regression Models Using Information Criteria

نویسنده

  • Dennis Beal
چکیده

For large multivariate data sets the data analyst often wants to know the best set of independent regressors to use in a multiple linear regression model. Akaike’s Information Criteria (AIC) is one information criterion calculated in SAS that is used to score a model. For a small number of independent variables p, an explicit enumeration of all possible 2 models is possible. However, for large multivariate data sets where p is large, an explicit enumeration of all possible models becomes computationally intractable. This paper presents SAS code that implements the exact implicit enumeration algorithm authored by Bao (2005) that has been shown to always arrive at the globally optimal minimum AIC value when let run to completion. The number of models evaluated to determine the optimal model with the smallest AIC score is minimal and shown to be much more efficient than an explicit enumeration of all possible models. A large multivariate data set is simulated with a known true model to demonstrate how fast the exact implicit enumeration algorithm arrives at the true model. The number of models evaluated is compared to an explicit enumeration algorithm and the REG procedure in SAS. This paper is for intermediate SAS users of SAS/STAT who understand multivariate data analysis and SAS macros.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalized implicit enumeration algorithm for a class of integer nonlinear programming problems

Presented here is a generalization of the implicit enumeration algorithm that can be applied when the objec-tive function is being maximized and can be rewritten as the difference of two non-decreasing functions. Also developed is a computational algorithm, named linear speedup, to use whatever explicit linear constraints are present to speedup the search for a solution. The method is easy to u...

متن کامل

Comprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques

Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

A zero one programming model for RNA structures with arclength ≥ 4

In this paper, we consider RNA structures with arc-length 4 . First, we represent these structures as matrix models and zero-one linearprogramming problems. Then, we obtain an optimal solution for this problemusing an implicit enumeration method. The optimal solution corresponds toan RNA structure with the maximum number of hydrogen bonds.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011